Semantic based generation of Japanese German translation system
نویسندگان
چکیده
Project SEMSYN*** achieved a state where a prototype system generates German texts on the basis of the semantic representation produced from Japanese texts by ATLAS/It of Fujitsu Laboratory. This paper describes some problems that are specific to our semantic based approach and some results of the evaluation study that has been made by the Germanist group. I. Generation procedure in SEMSYN This section summarizes the SEMSYN genration procedure. Those readers who are more interested in the SEMSYN system are recommended to read our previous COLING84[I] paper or the paper submitted to this conference[2]. The generation process begins with the conversion of the semantic networks, each represents one sentence, into a so-called IKBS (Instantiated Knowledge Base Schema.) The IKBS is an instantiation of case or concept schemata denoted by semantic symbols as nodes in the semantic network. A case schema contains three main description slots; a) roles of cases associated with the semant ic symbol, b) t ransformat ion ru les of schemata, c) choice of German syntactic realization schemata. Being triggered by the semantic symbols of the given network, IKBS specifies the best basic syntactic structure associated with a German word by checking fillers of roles and converts them into functional roles within each German syntactic category. A German syntacto-morphological component called SUTRA-S[3] a extended version of SUTRA [4] generates German surface texts from the instantiated syntactic structure called IRS (Instantiaed, Realization Schemata.) Though English-like terms are used for semantic symbols, the choice of a German word associated with each semantic symbol and its syntactic structure very differ from t h e E n g l i s h c o r r e s p o n d i n g o n e . It. Some problems of semanitc based translation approach There are some advantages as well as disadvantages of the semantic based approach, which we anticipated at the beginning of the project. Theoretically speaking, a reason why we adopted a semantic based approach againt the syntactic transfer approach is founded on the cultural difference and communication barriers between the two project groups that cooperate with each other to build up a translation system. Understanding the content of the origenal sentence from the given semantic representation the generation group could express it in a way that is common in its mother tangue, relatively free from the syntactic restriction and lexical corresponding terminology. It is a well known fact that one language of a culture can only be interpreted and not literally be translated into the other languages of different cultures, as it would be possible within the same cultural sphere. As the matter of fact we often took this advantage in our generation system. On the other hand, exactly this freedom turned out frequently to be a disadvantage on the generation side. Dealing with real data (titles of sientific papers in the field of information technology from the Japanese data base JOIS) we encountered new problems we didn't expect before and recognized the limit of our approach. In the following we describe some of these problems: ( l ~ i o n ~ e s e oriI~inal text We had also to come up with this well known problem such as lack of articles (definite or indefinite) and of distinction between numbers (singular or plural.) for nouns as well as verbs. We embedded some heuristic rules in KBS and dictionary to add these syntactic features, if they must not be missed in the German text. There still exists deeper semantics which rules the decisions, but cannot be represented in general, except for very limited cases. Heuristic rules are based on our ambiguity conservation principle, i.e. we keep the ambiguity of input text as much as possible to avoid any active selection of one alternative, that might lead to a wrong expression from the view point of the author of the titles. Following examples show typical errors of numbers and articles generated by the present SEMSYN heuristics. They also illustrate how difficult it is to find a trade off between the ambiguity conservation and an active decision infered from the content: E.g. l : ~ J ~ I ~ £ m $ ~ C 0 ) i ~ 0 [ ~ 7 4 , y~7" In~Y~Zs0~ SEMSYN K~eneration: Die Verwendun~ yon kleinen Computern zur Durehfuehrung von g_rgssen~graphischen Programnlen (The a p p l i c a t i o n of ~nall_.cg..mRuter § for the e x e c u tion of l a r e g e ~ ~ g g r a m m s ) Comment: The au thor of the paper wil l discuss how to use a small computer to execute a very large graphic package, so readers may naturally assume one small computer instead of many small computers, though it is possible to assume the latter. On the other hand, it is generally assumed that a computer processes many programms. For this reason the latter plural case is more natural than tile former case. However, it is a bad German to have neither a number feature nor an article as it is in the original text. E.g. 2: ~iI~I~50~Y p e I) ~---~/~ ~/O' )~y)~ ~y]-~< I/--~Y4 ~/~>~ SEMSYN genera t ion: Die Entwicklung des Kerns brim Betriebssystem yore ver te i l ten T ~ 2 fuer real -time Anwendunge_n. Correct German: Die Entwicklung des Kerns eines verteilten Betriebssystems fuer Echtzeitanwendungen (The development of the kernel in the operating system of the distributed type for real-time applications) Conmmnt: It is assumed that the author developed the kernel of one distributed OS, instead of many distributed OS, for many applications. 2) ~ I J i t o f ~ ' u n c t i o n s One of the hard problems we expected in our semantic
منابع مشابه
Language Generation From Conceptual Structure: Synthesis Of German In A Japanese/German MT Project
This paper idescribes the current state of the S~/~gYN project , whose goal is be develop a module for generation of German from a semantic representation. The first application of this module is within the framework of a Japanese/German machine translation project. The generation process is organized into three stages that use distinct knowledge sources. ~ne first stage is conceptually oriente...
متن کاملInteractive Translation of Conversational Speech
We present JANUS-II, a large scale system effort aimed at interactive spoken language translation. JANUS-II now accepts spontaneous conversational speech in a limited domain in English, German or Spanish and produces output in German, English, Spanish, Japanese and Korean. The challenges of coarticulated, disfluent, ill-formed speech are manifold, and have required advances in acoustic modeling...
متن کاملTranslation using Minimal
We describe minimal recursion semantics (MRS), a framework for semantics within HPSG, which considerably simpliies transfer and generation. We discuss why, in general, a semantic representation with minimal structure is desirable for transfer and illustrate how a descriptively adequate representation with a non-recursive structure may be achieved. The paper illustrates the application of MRS to...
متن کاملTreating Multiple-Subject Constructions in a Constraint-Based MT-System
In Korean and Japanese some sentences are to be found which contain more than one nominative-NP. Such constructions are called ‘multiple-subject constructions’ or ‘double-subject constructions’. They do not only raise a question about their syntactic and semantic nature but also cause such problems as structural changes in MT. They must be considered in designing an MT-System between two typolo...
متن کاملThe SEMSYN Generation System: Ingredients, Applications, Prospects
We report about the current status of the SEMSYN generation system. This system -initially implemented within a Japanese to German MT project – has been applied to a variety of generation tasks both within MT and text generation. We will work out how these applications enhanced the system's capacities. 1. THE STARTING POINT The SEMSYN project began in 1983 with an MT application as starting poi...
متن کامل